AITopics | generation network

Collaborating Authors

generation network

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Geometry-Aware Neural Rendering

Joshua Tobin, Wojciech Zaremba, Pieter Abbeel

Neural Information Processing SystemsFeb-13-2026, 10:55:05 GMT

Understanding the 3-dimensional structure of the world is a core challenge in computer vision androbotics. Neural rendering approaches learnanimplicit 3D model by predicting what acamera would see from an arbitrary viewpoint.

artificial intelligence, machine learning, representation, (19 more...)

Neural Information Processing Systems

Country: North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Geometry-Aware Neural Rendering

Joshua Tobin, Wojciech Zaremba, Pieter Abbeel

Neural Information Processing SystemsAug-19-2025, 22:47:52 GMT

Agents can learn 3D structure implicitly (e.g., using end-to-end reinforcement learning [

arxiv preprint arxiv, dataset, representation, (11 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Robots (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)

Add feedback

EndoMUST: Monocular Depth Estimation for Robotic Endoscopy via End-to-end Multi-step Self-supervised Training

Shao, Liangjing, Bai, Linxin, Du, Chenkang, Chen, Xinrong

arXiv.org Artificial IntelligenceJun-23-2025

Monocular depth estimation and ego-motion estimation are significant tasks for scene perception and navigation in stable, accurate and efficient robot-assisted endoscopy. To tackle lighting variations and sparse textures in endoscopic scenes, multiple techniques including optical flow, appearance flow and intrinsic image decomposition have been introduced into the existing methods. However, the effective training strategy for multiple modules are still critical to deal with both illumination issues and information interference for self-supervised depth estimation in endoscopy. Therefore, a novel framework with multistep efficient finetuning is proposed in this work. In each epoch of end-to-end training, the process is divided into three steps, including optical flow registration, multiscale image decomposition and multiple transformation alignments. At each step, only the related networks are trained without interference of irrelevant information. Based on parameter-efficient finetuning on the foundation model, the proposed method achieves state-of-the-art performance on self-supervised depth estimation on SCARED dataset and zero-shot depth estimation on Hamlyn dataset, with 4\%$\sim$10\% lower error. The evaluation code of this work has been published on https://github.com/BaymaxShao/EndoMUST.

artificial intelligence, image understanding, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2506.16017

Country: Asia (0.28)

Genre:

Research Report (0.50)
Workflow (0.48)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision > Image Understanding (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Layer Separation: Adjustable Joint Space Width Images Synthesis in Conventional Radiography

Wang, Haolin, Ou, Yafei, Ambalathankandy, Prasoon, Ota, Gen, Dai, Pengyu, Ikebe, Masayuki, Suzuki, Kenji, Kamishima, Tamotsu

arXiv.org Artificial IntelligenceFeb-3-2025

Rheumatoid arthritis (RA) is a chronic autoimmune disease characterized by joint inflammation and progressive structural damage. Joint space width (JSW) is a critical indicator in conventional radiography for evaluating disease progression, which has become a prominent research topic in computer-aided diagnostic (CAD) systems. However, deep learning-based radiological CAD systems for JSW analysis face significant challenges in data quality, including data imbalance, limited variety, and annotation difficulties. This work introduced a challenging image synthesis scenario and proposed Layer Separation Networks (LSN) to accurately separate the soft tissue layer, the upper bone layer, and the lower bone layer in conventional radiographs of finger joints. Using these layers, the adjustable JSW images can be synthesized to address data quality challenges and achieve ground truth (GT) generation. Experimental results demonstrated that LSN-based synthetic images closely resemble real radiographs, and significantly enhanced the performance in downstream tasks. The code and dataset will be available.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2502.01972

Country:

Asia > Japan > Hokkaidō > Hokkaidō Prefecture > Sapporo (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
Asia > Japan > Honshū > Kantō > Kanagawa Prefecture > Yokohama (0.04)
Asia > Japan > Honshū > Kansai > Hyogo Prefecture > Kobe (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Nuclear Medicine (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

An Unforgeable Publicly Verifiable Watermark for Large Language Models

Liu, Aiwei, Pan, Leyi, Hu, Xuming, Li, Shu'ang, Wen, Lijie, King, Irwin, Yu, Philip S.

arXiv.org Artificial IntelligenceDec-11-2023

However, current watermark detection algorithms require the secret key used in the watermark generation process, making them susceptible to security breaches and counterfeiting during public detection. To address this limitation, we propose an unforgeable publicly verifiable watermark algorithm that uses two different neural networks for watermark generation and detection, instead of using the same key at both stages. Meanwhile, the token embedding parameters are shared between the generation and detection networks, which makes the detection network achieve a high accuracy very efficiently. Experiments demonstrate that our algorithm attains high detection accuracy and computational efficiency through neural networks with a minimized number of parameters. Subsequent analysis confirms the high complexity involved in forging the watermark from the detection network. Therefore, texts generated by LLMs need to be detected and tagged. At present, some watermarking algorithms for LLM have proved successful in making machinegenerated texts detectable by adding implicit features during the text generation process that are difficult for humans to discover but easily detected by the specially designed method (Christ et al., 2023; Kirchenbauer et al., 2023). The current watermark algorithms for large models utilize a shared key during the generation and detection of watermarks. They work well when the detection access is restricted to the watermark owner only. However, in many situations, when third-party watermark detection is required, the exposure of the shared key would enable others to forge the watermark. Therefore, preventing the watermark forge in the public detection setting, is of great importance. In this work, we propose the first unforgeable publicly verifiable watermarking algorithm for large language models (LLMs).

algorithm, generation network, watermark, (15 more...)

arXiv.org Artificial Intelligence

2307.1623

Country:

North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > South Carolina (0.04)
North America > United States > New York (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

Low-complexity Deep Video Compression with A Distributed Coding Architecture

Zhang, Xinjie, Shao, Jiawei, Zhang, Jun

arXiv.org Artificial IntelligenceApr-2-2023

Prevalent predictive coding-based video compression methods rely on a heavy encoder to reduce temporal redundancy, which makes it challenging to deploy them on resource-constrained devices. Since the 1970s, distributed source coding theory has indicated that independent encoding and joint decoding with side information (SI) can achieve high-efficient compression of correlated sources. This has inspired a distributed coding architecture aiming at reducing the encoding complexity. However, traditional distributed coding methods suffer from a substantial performance gap to predictive coding ones. Inspired by the great success of learning-based compression, we propose the first end-to-end distributed deep video compression framework to improve the rate-distortion performance. A key ingredient is an effective SI generation module at the decoder, which helps to effectively exploit inter-frame correlations without computation-intensive encoder-side motion estimation and compensation. Experiments show that our method significantly outperforms conventional distributed video coding and H.264. Meanwhile, it enjoys 6-7x encoding speedup against DVC [1] with comparable compression performance. Code is released at https://github.com/Xinjie-Q/Distributed-DVC.

artificial intelligence, machine learning, video, (18 more...)

arXiv.org Artificial Intelligence

2303.11599

Country:

Asia > China > Hong Kong (0.04)
South America > Uruguay > Artigas > Artigas (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Vision (0.66)

Add feedback

Unsupervised Industrial Anomaly Detection via Pattern Generative and Contrastive Networks

Huang, Jianfeng, Li, Chenyang, Lin, Yimin, Lian, Shiguo

arXiv.org Artificial IntelligenceJul-20-2022

It is hard to collect enough flaw images for training deep learning network in industrial production. Therefore, existing industrial anomaly detection methods prefer to use CNN-based unsupervised detection and localization network to achieve this task. However, these methods always fail when there are varieties happened in new signals since traditional end-to-end networks suffer barriers of fitting nonlinear model in high-dimensional space. Moreover, they have a memory library by clustering the feature of normal images essentially, which cause it is not robust to texture change. To this end, we propose the Vision Transformer based (VIT-based) unsupervised anomaly detection network. It utilizes a hierarchical task learning and human experience to enhance its interpretability. Our network consists of pattern generation and comparison networks. Pattern generation network uses two VIT-based encoder modules to extract the feature of two consecutive image patches, then uses VIT-based decoder module to learn the human designed style of these features and predict the third image patch. After this, we use the Siamese-based network to compute the similarity of the generation image patch and original image patch. Finally, we refine the anomaly localization by the bi-directional inference strategy. Comparison experiments on public dataset MVTec dataset show our method achieves 99.8% AUC, which surpasses previous state-of-the-art methods. In addition, we give a qualitative illustration on our own leather and cloth datasets. The accurate segment results strongly prove the accuracy of our method in anomaly detection.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2207.09792

Country: Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > Promising Solution (0.34)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Graph-Guided Deformation for Point Cloud Completion

Shi, Jieqi, Xu, Lingyun, Heng, Liang, Shen, Shaojie

arXiv.org Artificial IntelligenceNov-11-2021

For a long time, the point cloud completion task has been regarded as a pure generation task. After obtaining the global shape code through the encoder, a complete point cloud is generated using the shape priorly learnt by the networks. However, such models are undesirably biased towards prior average objects and inherently limited to fit geometry details. In this paper, we propose a Graph-Guided Deformation Network, which respectively regards the input data and intermediate generation as controlling and supporting points, and models the optimization guided by a graph convolutional network(GCN) for the point cloud completion task. Our key insight is to simulate the least square Laplacian deformation process via mesh deformation methods, which brings adaptivity for modeling variation in geometry details. By this means, we also reduce the gap between the completion task and the mesh deformation algorithms. As far as we know, we are the first to refine the point cloud completion task by mimicing traditional graphics algorithms with GCN-guided deformation. We have conducted extensive experiments on both the simulated indoor dataset ShapeNet, outdoor dataset KITTI, and our self-collected autonomous driving dataset Pandar40. The results show that our method outperforms the existing state-of-the-art algorithms in the 3D point cloud completion task.

cloud, information, point cloud, (14 more...)

arXiv.org Artificial Intelligence

2112.0184

Country:

Asia > China > Hong Kong (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report > New Finding (0.66)

Industry:

Transportation (0.34)
Automobiles & Trucks (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Vision (0.71)
Information Technology > Artificial Intelligence > Robots (0.67)

Add feedback

Bach Style Music Authoring System based on Deep Learning

Kong, Minghe, Huang, Lican

arXiv.org Artificial IntelligenceOct-6-2021

With the continuous improvement in various aspects in the field of artificial intelligence, the momentum of artificial intelligence with deep learning capabilities into the field of music is coming. The research purpose of this paper is to design a Bach style music authoring system based on deep learning. We use a LSTM neural network to train serialized and standardized music feature data. By repeated experiments, we find the optimal LSTM model which can generate imitation of Bach music. Finally the generated music is comprehensively evaluated in the form of online audition and Turing test. The repertoires which the music generation system constructed in this article are very close to the style of Bach's original music, and it is relatively difficult for ordinary people to distinguish the musics Bach authored and AI created.

music, neural network, sequence, (12 more...)

arXiv.org Artificial Intelligence

2110.0264

Country:

Asia > China > Zhejiang Province > Hangzhou (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (0.65)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Add feedback

FoleyGAN: Visually Guided Generative Adversarial Network-Based Synchronous Sound Generation in Silent Videos

Ghose, Sanchita, Prevost, John J.

arXiv.org Artificial IntelligenceJul-20-2021

Deep learning based visual to sound generation systems essentially need to be developed particularly considering the synchronicity aspects of visual and audio features with time. In this research we introduce a novel task of guiding a class conditioned generative adversarial network with the temporal visual information of a video input for visual to sound generation task adapting the synchronicity traits between audio-visual modalities. Our proposed FoleyGAN model is capable of conditioning action sequences of visual events leading towards generating visually aligned realistic sound tracks. We expand our previously proposed Automatic Foley dataset to train with FoleyGAN and evaluate our synthesized sound through human survey that shows noteworthy (on average 81\%) audio-visual synchronicity performance. Our approach also outperforms in statistical experiments compared with other baseline models and audio-visual datasets.

artificial intelligence, machine learning, video, (16 more...)

arXiv.org Artificial Intelligence

2107.09262

Country:

North America > United States > Texas > Bexar County > San Antonio (0.14)
Asia > Japan > Honshū > Chūbu > Nagano Prefecture > Nagano (0.04)
Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.04)

Genre: Research Report (0.82)

Industry:

Leisure & Entertainment (1.00)
Media > Film (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback